-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NBLAST score matrix builder v2 #69
Conversation
I think that ScoreFunction could reasonably be replaced. |
Order of operations is another possible question. Currently, while building the lookup table, each query-target pair is assessed and its array of output scores digitized into the array of counts. This has a low memory overhead because we don't keep the raw scores around for very long; just compress it into the table as soon as possible. However, it might be convenient to not define the digitizers until after we have all the raw scores - the boundaries could be learned from that data (alluded to in |
There might also be a bit of mileage if scoring was serialised. Currently that part of the process hardly registers but it might with more complex scoring functions and/or bigger matrices. |
I was wondering if we should provide a system similar to the bridging transforms that users/downstream packages can use to register scoring functions. Users could then invoke NBLAST with e.g. |
I don't think there's much of a need to do that - the mapping from score function reference to the score function would just be a dict. If they've got a handle to the score function already, they can just use that, with no need to bring our opinions on an additional naming scheme in, or they can just maintain the dict themselves if they're working with many scoring functions at the same time. |
I've just given this a go with the lateral pairs in seymour's brain. It doesn't take nearly as long as I thought it would to build the score matrix, and said matrix looks pretty sane, even with all the bins determined automatically. That is, if the points are close enough they don't really care about orientation, but if they're further away then the orientation starts to matter more. The distance boundaries increase in a somewhat geometric way: interestingly, the dot product boundaries are a sort of reverse geometric (the bins are large for low values and small for high values, i.e. we don't care whether they're quite different or very different orientations, but we do care whether they're similar or very similar). |
I think this should be functional and documented, including an ipynb tutorial, but I'm not entirely sure whether I've got the indexing right for that - where could I find the PR build of the docs? |
13dc375
to
54b1da0
Compare
Quick question: I'd be keen to produce a new score matrix for |
Yes, 1D is simple - just a The clearest way to do it would probably be for pre and post to be two scoring functions. Possibly the However, if your distance scale was appropriate for both, and you just wanted them to drop into different slots, you could hack a Lookup2D for this purpose as well: if your point match function for two connector tables queried pre to pre and post to post then returned a tuple of Alternatively, you could query all synapses to all synapses, then your second axis could be "is the nearest synapse of the same type"; that column would basically just set to 0 if they're not. Or you could split it into 4 columns, in case there are different values for matching Q pre to T pre, Q pre to T post, Q post to T post, Q post to T pre. |
The I think this is ready to go in, but as it's a pretty significant change I'll hold off until you get a chance to take a look! |
Thanks! Will attempt to have a look Wed or Thu! |
Looks good to me. One thing that came to my mind though is this: as is, NBLAST will complain if data is not in microns. That made sense as long as there was only the FCWB score matrix but might become a nuisance/confusing if people use their own. I can see two options to deal with this:
I'm slightly in favour of option 1. Otherwise happy for you to proceed and merge. |
Digitizers could optionally have a unit defined on them on instantiation; the Nblaster could then check the Lookup's digitizers for such constraints. This would work fine for current usage, I think. At the end of the day the match function, lookup table, and lookup axes are all pretty tightly bound, but the blasting algorithm would have to change a lot for this level of generic-ness to be a problem, I think. |
Hm, this turned into a whole thing 😆 The sensible place to do unit checking is when dotprops are appended to the Nblaster, but partitioned nblasts where the same dotprops may be added to multiple nblasters could make this very noisy. I see Currently I'm going with adding a To make things a little more complicated, and as I think you've found already, Non-isometric units are also a bit tricky, it would be nice to just unitise that all away before it gets anywhere close to NBLAST, but again, doing that multiple times would slow things down. |
Maybe there could be
This function could basically be used wherever |
Oh haha, I didn't mean for you to go down that rabbit hole. Re a Perhaps the Re Re Line 486 in 3e59e41
Sorry for the messy collection of thoughts. At the end of the day, I'd be perfectly happy with only checking units when |
27d5e44
to
bdd7012
Compare
I put this aside as I rant into a couple of
These are due to be released any day now hgrecco/pint#1450 . However, that release also drops support for python 3.7 👀 |
FWIW: I'm game for dropping 3.7 |
This reverts commit 62b8094.
I've stripped out the unit-checking for now as I think this is viable without it; it could be added later with the new pint release (which has just come out). |
👍 ready to merge when you are! |
Rising from the grave...
Differences from #28 :
Digitizer
objects rather than boundaries. This better encapsulated the logic for parsing and stringifying the digitizers, and made the handling of left/right half-open boundaries and value clamping more flexible. It's a bit more verbose, but most people will be working with existing CSVs anyway so it'll all be hidden.ScoringFunction.max_dist
; alternatively they pre/append -inf and inf.And a big training dataset (L/R pairs in the L1 brain) is within reach thanks to @swilson27.
Open questions: is ScoringFunction sufficiently internal that I can just replace it with the new Lookup2d class, or should it be kept around for compatibility?